Variant Discovery ◾ 141
../refvcf/1000G_phase1.snps.high_confidence.hg38.vcf.gz \
--resource:dbsnp,known=false,training=true,truth=false,prior=2.0
\
../refvcf/Homo_sapiens_assembly38.dbsnp138.vcf \
-an QD -an MQ -an MQRankSum \
-an ReadPosRankSum -an FS -an SOR \
-mode SNP \
-O ../VQSR/tranches.out \
--tranches-file ../VQSR/vqsrplots.R
cd ..
(ii) Applying recalibration rules:
Next is applying the model on the variants using ApplyVQSR by tranche sensitivity
thresholds to filter variants and adding PASS to FILTER field of the variants that have
VQSLOD above the threshold while the variants with VQSLOD below the threshold are
labeled with tranche name.
#Apply VQSR
mkdir filteredVCF
~/software/gatk-4.2.3.0/gatk \
--java-options -Xmx4G ApplyVQSR \
-V vcf/allsamplesSNP_chr21.vcf \
-O filteredVCF/humanSNP.vcf \
--truth-sensitivity-filter-level 99.7 \
--tranches-file VQSR/vqsrplots.R \
--recal-file VQSR/tranches.out \
-mode SNP \
--create-output-variant-index true
We can follow the same steps for filtering InDels.
The downside of the VQSR approach is that it is not applicable for all organisms, because
high-quality variant datasets are available only for human and some model organisms. As
an alternative to the VQSR, we can use hard filtering with VariantFiltration tool, which
enforces a hard filter for one or more annotations such as QD (quality depth), QUAL
(Phred quality score), SOR (StrandOddsRatio), FS (FisherStrand), MQ (Mapping quality),
and MQRankSum. For these filters, use the thresholds recommended by GATK4. PASS
will be added to FILTER field on the VCF file if a variant passes the filtering; otherwise, the
name of the filter will be added instead. GATK4 recommends some filters for both SNPs
and InDels. The following scripts are for hard filtering SNPs and InDels.
#The hard filtering GATK best practice for SNP
~/software/gatk-4.2.3.0/gatk \
--java-options \
-Xmx10G VariantFiltration \
-V vcf/allsamplesSNP_chr21.vcf \
-filter “QD<2.0” \